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1 .  Introduction  and  Background 

Let  (Xj.Y^),  (x2»^2)*'”  '3e  independent  bivariate  data  following  the  mechanism 

(1.1)  Yi  =  m(Xj)  +  ei  i=l»2, . . . 

where  x^  are  fixed  design  points,  the  are  zero  mean  random  variables  (rv)  and 

m(x)  is  the  unknown  regression  function.  Let  us  assume  that  n  observations  have 

been  made  with  05x,s...<x  =1  and  that  Y.  is  distributed  as  a  rv  with  cumulative 
In  l 

distribution  function  F(y;  x^)  and  with  probability  density  function 

f(y;  e  F  =  (f(y;  x)  :  0<x<l}  . 

The  nonparametric  regression  function  estimation  problem  is  to  estimate 

m(x)  =  /  y  f(y;  x)  dy 
given  n  observations  (Xj .Y^ , . . . , (x^Y^) . 

Many  estimators  of  m(*)  in  this  "fixed  design  sampling  "  model  (1.1)  have 
been  considered.  (Priestley  and  Chao,  1972;  Reinsch,  1967;  Wold,  1974).  Most 
of  these  authors  assume  a  "i.i.d.  error  structure",  i.e.  f(y;  x)  =  fQ(y-m(x)) 
for  some  fixed  density  fQ.  We  consider  here  kernel -type  estimators 

_  n 

(1.2)  mn(x)  -  l  ct.(x)Y.  ,  0<x<l 

i=l 

where  the  sequence  of  (concentrating)  weights  are  derived  from  a  kernel 

function  to  be  defined  later.  Nonparametric  regression  function  estimators  of 
this  kind  were  introduced  by  Priestley  and  Chao  (1972)  and  further  discussed  by 
Benedetti  (1977) .  As  can  be  seen  from  these  early  papers  and  more  recently  from 
Gasser  and  Muller  (1979),  estimators  of  the  kind,  defined  in  (1.2),  are  usually 
biased.  Using  a  suggestion  of  Bartlett  (1963)  for  the  bias  reduction  of  density 
estimators,  Gasser  and  Muller  gave  an  approximate  expression  for  the  bias  and 
variance  of  such  estimators  (see  Lemma  1). 

The  purpose  of  this  paper  is  first  to  define  a  generalized  jackknife  estima¬ 
tor  of  m(x),  as  introduced  by  Schucany  and  Sommers  (1977)  for  kernel-type  density 
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estimators.  Then  it  is  shown  that  the  generalized  jackknife,  which  is  a  method 
of  forming  linear  combinations  of  kernel  regression  function  estimators,  reduces 
the  bias  of  m^Cx) .  The  generalized  jackknife  technique,  applied  to  kernel  re¬ 
gression  function  estimators,  exhibits  thus  the  same  properties  as  in  density 
estimation:  Schucany  and  Sommers  (1977)  show  that  a  jackknifed  kernel  estimator 
of  a  density  f(x)  reduces  (asymptotically)  bias  in  the  way  of  Bartlett's  (1963) 
original  suggestion. 

We  will  also  show  that  for  finite  sample  size  in  the  regression  setting 
considered  here,  the  jackknifed  estimator  of  mn(x)  may  have  a  larger  mean  square 
error  (MSE)  than  the  original  estimator  when  the  weights  ol(x)  are  improperly 
chosen.  A  proper  choice  of  this  parameter  is  usually  impossible  in  practical 
situations,  so  the  experimenter  is  always  confronted  with  the  risk  of  selecting 
a  "wrong"  sequence  of  weights  .  Since  the  MSE  is  widely  accepted  as  a  cri¬ 
terion  for  measuring  the  accuracy  of  nonparametric  estimators  (Epanechnikov 
(1969),  Rosenblatt  (1971),  among  others)  this  (negative)  result  shows  that  in 
certain  situations  the  generalized  jackknife  method,  applied  to  kernel  regres¬ 
sion  function  estimators,  may  actually  fail  to  improve  the  accuracy  of  the  esti¬ 
mator. 

We  now  present  some  of  the  choices  of  the  weight  functions  ou(»)  in  (1.2). 
For  instance,  Priestley  and  Chao  (1972)  suggested  to  use 

(1.3)  at(x)  =  h"1K((x-xi)/h)rxi-xil) 

with  xQ=0,  h=h(n)  a  sequence  of  bandwidth  tending  to  zero  and  a  kernel  function 
K(*),  to  be  defined  in  (1.5).  Gasser  and  Muller  (1979)  considered  the  following 
weights 

s. 

1  i 

(1.4)  a. (x)  *  It  /  K((x-u)/h)du 

si-l 

with  a  sequence  of  interpolating  points  x^SjSx^,  sQ=0,  sn=l . 


Cheng  and  Lin  (1981")  showed  that  with  x^=s^  the  weights  defined  through  (1.4) 
or  (1.3)  give  asymptotically  the  same  consistency  rate.  We  therefore  restrict 
for  convenience  our  attention  to  weights  as  defined  in  (1.4).  We  consider  only 
even  kernel  functions  K(»)  which  vanish  outside  r-A,A]  are  continuous  and  satisfy 
for  some  integer  r 

A 

/  uJK(u)du  =  0  j=l,...,  r-1 

(1.5)  _A 

A 

/  u  K(u)du  =  r ! A(K,r)  <  00  . 

-A 

These  conditions  on  K  are  assumed  to  hold  by  some  previous  authors  in  the 

density  estimation  setting  or  in  the  regression  function  estimation  case  (Wegman, 

1972a;  Gasser  and  Muller,  1979).  The  assumption  of  finite  support  of  K  is  not 

stringent.  By  reading  through  the  proofs,  it  will  be  clear  that  the  results  also 

hold  for  kernels  with  infinite  support.  We  are  restricting  ourselves  on  kernels 

with  finite  support  only  for  computational  convenience.  In  practical  applications 

every  kernel  will  be  of  finite  support,  due  to  lower  bounds  on  machine  precisions. 

We  further  assume  for  the  remainder  of  the  paper  that  x^  are  asymptotically 

equispaced:  sup  |s.-s.  ..  |  =  0(n"*)  .  We  then  have  from  Gasser  and  Muller  (1979) 

isj<n  ^  •*" 

the  following  result  on  the  mean  square  error. 

Lemma  1 

Let  m^(»)  be  uniformly  bounded  with  p=2t,  t>0  and  let  K  be  a  kernel  func- 

2  2 

tion  satisfying  (1.5)  with  r<p.  Assume  that  o  (x)  =  /  [y-m(x)l  f(y;  x)dy  is 
uniformly  continuous  for  0<x<l,  then  the  leading  term  of  the  MSE  of  mn(x),  0<x^l 
is 

(nh)'18](a2(x)  ♦  T  l  h2sm(2s)A  (K,2s)  ]2  , 

s=l 

where  6^  *  /  K2(u)du. 


A  similar  MSE  decomposition  in  variance  and  bias  parts  holds  for  density 
estimators  fR(x)  =  (nh)"1^"_1  K((x-X.)/h)  (Parzen,  1962)  but  with  derivative 
of  f(*)>  the  density,  in  the  bias  instead  of  derivatives  of  m(*)-  The  first 
part  of  the  following  section  is  therefore  quite  analogous  to  Schucany  and 
Sommers  (1977) . 


2.  Does  the  jackknifed  estimate  improve  the  kernel  estimate? 

We  shall  construct  here  combinations  of  kernel  estimators,  using  the 
generalized  jackknife  method  of  Schucany  and  Sommers  (1977).  Note  that,  in  the 
context  of  the  generalized  jackknife,  the  "leave-out"  techniques,  subscribed  to 
the  ordinary  jackknife,  will  not  be  employed.  We  first  state  a  lemma  giving  the 
convergence  rate  of  the  bias  of  the  jackknifed  estimator  and  discuss  furthermore 
an  example  for  which  the  jackknife  estimator  fails  to  improve  the  MSE  of  the 
original  kernel  estimate. 

Define  for  £=1,2  m^  ^(x)  =  ^(x^  where 

s  . 

l 

i  (*)  ■  ty  /  K^CCt-u)/h^)du  ,  hjshjfn),  h2=h2(n) 

’  si-l 


and  h^  denote  different  sequences  of  bandwidths  and  Kj,K2  kernel  functions  with 
rj=r2=2  respectively.  The  generalized  jackknife  estimate  of  rn^fx)  is  then 
defined  as 


(2.1) 


-lr- 


G  ^  ml  ,n^  ’  m2,n^-J  =  ^1_R)  ^1>n(x)-Rm2  n(x)j 


R^l  . 


2 

It  is  assumed  for  the  remainder  of  the  paper  that  a  (x) ,  as  defined  in  Lemma  1 , 
is  uniformly  continuous  for  0<x<l.  The  proof  of  the  following  lemma  is  evident 


in  view  of  Lemma  1 . 


Lemma  2 


Suppose  that  m^(»)  is  uniformly  bounded  with  p=2t,  t22  and  let  ,  K2  be 
kernels  satisfying  (1.5)  with  r1 ,r^  <  p.  Then  the  leading  bias  term  of 

GrSl,n(,)’  52,nW1  ls 

(2.2)  (1-R)"1  y  rhjS/V(Kj,2s)  -  Rh2SA(K2,2s)]m(2s)  (x)  . 

S  =  1 

The  reduction  of  bias  in  (2.2)  is  now  made  possible  by  a  suitable  choice  of  the 
balancing  constant  R. 

If  we  set  R=Rn=(hJ/h2)A(K1,2)/A(K2,2), 

then 

hjA(Kr2)  -  Rh2A(K2,2)  =  0  , 

(2) 

and  the  first  bias  term  containing  mv  •'(x),is  eliminated.  We  thus  have  indeed 

an  estimator  with  a  faster  bias  rate  and  moreover  we  could  have  produced 

Gfin,  (x),  in_  (x)]  with  the  single  kernel 
l,n  ^,n 

K*(u)  =  TKj (u)  -  vc^K2(cnu)]/n  -  vc^l 

where  v  =  A (Kj,2)/A(K2,2)  and  c=cn=h^ (n)/h2(n) .  Note  that,  in  contrast  to  ,K2> 

the  kernel  K*(»)  may  still  depend  on  n,  but  satisfies  (1.5)  for  all  n  with  r=4  as 

is  shown  in  Schucany  and  Sommers  (1977),  p.  421.  Note  also  that  the  calculations  in 

that  paper,  showing  that  K*  belongs  to  the  class  of  kernel  functions  (see  (1.5)) 

with  r=4,  do  not  depend  on  the  density  estimation  setting. 

As  is  shown  in  Lemma  1  and  Lemma  2,  the  bias  terms  of  both  in  and  Grin,  ,in„  1 

n  In’  2n 

still  depend  on  m(*)  through  the  derivatives  of  the  regression  function.  An  opti 
mal  choice  of  h  with  respect  to  the  MSE  would  thus  involve  the  knowledge  of  the 
derivatives  of  the  regression  function.  A  conservative  strategy  of  the  experi¬ 
menter  could  therefore  be  to  subscribe  a  small  amount  of  smoothness  to  m(*)-  He 
could  assume,  for  instance,  that  the  second  derivative  of  ra(*)  exists  and  is  con¬ 
tinuous.  On  the  other  hand,  he  could  do  better  if  even  the  fourth  derivative 


exists  by  choosing  a  kernel  function  K  satisfying  (1.5)  with  r=4.  This  justifies 

the  use  of  a  generalized  jackknife  kernel  estimator,  as  defined  in  (2.1).  If,  in 

fact,  m(*)  is  smoother  than  we  expected,  let's  say  we  started  with  a  kernel  K  with 

r=2,  the  generalized  jackknife  estimator  would  give  us  the  faster  vanishing  bias 

term  with  kernel  functions  and  with  r=2  in  the  class  defined  in  (1.5). 

We  now  investigate  the  properties  of  GEm^.n^]  in  a  small  example.  For  this 

define  K  =K„=K_,  where 
1  z  t 

K£(u)  =  3/4(l-u2)  |u|<l 

=  0  |u| >1 

is  the  Epanechnikov  (1969)  kernel  .  For  optimality  questions  of  this  narticular 
kernel,  we  refer  to  Rosenblatt  (1971).  This  kernel  function  obviously  satisfies 
(1.5)  with  r=2.  The  following  calculations  for  the  variance  remain  valid  also  in 
the  density  estimation  setting,  since  8K  or  BK*  occur  also  as  variance  factors 
there  (Parzen,  1962).  By  straightforward  computations  it  is  easy  to  obtain: 

8 K  =  J  K2 (u)du  =  3/5 

A(Ke,2)  =  1/10 
A(Ke,4)  =  1/280  . 

(2.3)  eK*  =  J  K  2(u)du 

=  n-c2l  2{J  K2(u)d  +c5J  K2(u)du-2c3/  KE(u)Kg(cu)du} 

=  n-c2]"2{3/5  +  9/10  c5  -  3/2  c3} 

=  9/10[c3  +  2c2  +(4/3Dc+  2/3]/[c  +  l]2  . 

Note  that  3„*  *»  9/8  as  c  *  1  which  is  considerably  higher  than  =  3/5.  This 
K  KE 

behavior  of  K*  can  also  be  drawn  from  table  1  in  Schucany  and  Sommers  (1977)  for  a  nor 

mal  density  kernel  and  R=. 99.  It  is  therefore  apparent  that  some  caution  must  be 


exercised  in  selecting  c,  which  is  the  same  as  choosing  the  balancing  factor  R 
or  hj  and  h^.  To  compensate  on  the  trade-off  between  bias  and  variance,  the 
faster  rate  of  the  bias  of  Gn  suggests  to  choose  h^  >h.  The  calculation  of  the 
variance  factors  8*.  and  8„*  in  (2.3)  suggests  the  choice  of  h  »  ($„*/£,,  lh 

*»g  •»  1  K  Kp 

15h/8  to  balance  the  variance  part  of  the  MSE.  How  does  this  choice  of  band- 

widths  now  affect  the  MSE  of  m  (x)  and  GCm.  (x) ,  nu  (x)),  given  that 

n  X  |  n  z  y  n 

exists  and  is  uniformly  bounded.  The  leading  term  of  the  MSE  of  mn(x)  is  given  by 

(2.4)  (nh)-1  +  (h2m(2)(x)/10  +  h4m(4) (x)/280}2  , 

RE 

whereas  the  principal  term  of  the  MSE  of  GCmj  n(x),  m2  n(x)]  is 

(2.5)  (nhj)'1  8k*  +  {-c-2h4m(4)(x)/280}2  . 

Assume  now  that  a=m^(x)/10  and  8=m^  (x)/280  are  positive  and  c  »  1 ,  and  assume 
in  addition  that  by  choosing  hj  »  15h/8,  the  variance  parts  of  (2.4)  and  (2.5) 
are  approximately  the  same.  With  this  selection  of  hj ,  the  bias  terms  now  read 

bias2(mn(x))  =  (h2a  +  h48)2 

bias2(Grinln,m2n])  «  {-506258h4/4096}2 
«  152.7682h8  . 

Comparing  now  these  bias  terms,  only  depending  on  h  now,  shows  that  with  the 
"wrong"  choice  of  hj  and  h2,  the  generalized  jackknife  estimator  may  fail  to 
improve  mn(x)  in  MSE.  More  precise  computations  yield  that,  if  h  is  chosen  to 
fulfill 

|(8/a)h2  -  .00658 1  >  .0814  , 

MSE  (Grin.  ]}  >  MSfcfiT  (x) }  . 
in  zn  n 


then 


Some  additional  remarks  should  be  made.  The  example  seems  to  be  somewhat 
artificially  constructed  since  we  restrict  our  attention  to  one  particular  ker¬ 
nel  function.  This  is  due  to  the  computations  of  Rosenblatt  (.1971),  Table  1, 

p.  1821,  showing  the  relative  insensitivity  of  the  MSF.  to  different  kernels. 

f  2) 

(See  also  Wegm  n  (1972b)).  Interesting  is  the  fact  that,  if  m  (xQ)  happens 
to  be  zero  for  some  x^,  the  new  jackknife  estimator  drastically  loses  MSE  ac¬ 
curacy,  provided  that  h^  was  chosen  in  such  a  way,  that  the  variance  parts  of 
both  mn(x)  and  Gfm^jii^]  are  approximately  equal  (c  »  1) .  A  proper  choice  of 
R  and  h^  is  in  practice  not  obtainable,  since  we  have  no  knowledge  about  the 
derivatives  of  the  regression  function.  It  is  also  not  possible  in  general  to 
compute  the  regions  of  bandwidths,  where  actually  improves  mn(x), 

since  these  require  the  constants  a  and  $. 

3.  Conclusion 

Under  the  proper  conditions,  several  of  the  original  type  of  kernel  regres¬ 
sion  estimators  proposed  by  Priestley  and  Chao  (1972)  can  be  combined  to  form 
generalized  jackknife  estimators  which  have  a  faster  bias  rate.  The  new  esti¬ 
mate  ,  produce  an  improved  rate  of  the  MSE  due  to  cancellations  of  bias  terms. 
The  generalized  jackknife  estimator,  as  defined  here  for  the  regression  function 
estimation  setting,  achieves  thus  the  same  properties  as  a  similar  estimator, 
introduced  by  Schucany  and  Sommers  for  density  function  estimation.  In  a  small 
example,  it  is  investigated  how  the  new  estimator  performs  when  compared  to 
ordinary  kernel  regression  function  estimates.  It  is  shown  there,  than  an  im¬ 
proper  choice  of  R,  the  balancing  factor  between  inln  and  m2n  may  introduce  an 
inflation  effect  on  the  MSE  of  Grmln,m2n].  ^  jackknifing  technique  of  kernel 
estimators  of  regression  functions  should  therefore  be  cautiously  performed 
with  a  proper  inspection  of  the  involved  parameters. 
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